skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Zhou, Mengxi"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Automating the annotation of scanned documents is challenging, requiring a balance between computational efficiency and accuracy. DocParseNet addresses this by combining deep learning and multi-modal learning to process both text and visual data. This model goes beyond traditional OCR and semantic segmentation, capturing the interplay between text and images to preserve contextual nuances in complex document structures. Our evaluations show that DocParseNet significantly outperforms conventional models, achieving mIoU scores of 49.12 on validation and 49.78 on the test set. This reflects a 58% accuracy improvement over state-of-the-art baseline models and an 18% gain compared to the UNext baseline. Remarkably, DocParseNet achieves these results with only 2.8 million parameters, reducing the model size by approximately 25 times and speeding up training by 5 times compared to other models. These metrics, coupled with a computational efficiency of 0.039 TFLOPs (BS=1), highlight DocParseNet's high performance in document annotation. The model's adaptability and scalability make it well-suited for real-world corporate document processing applications. 
    more » « less
  2. Adaptive optics-optical coherence tomography (AO-OCT) allows for the three-dimensional visualization of retinal ganglion cells (RGCs) in the living human eye. Quantitative analyses of RGCs have significant potential for improving the diagnosis and monitoring of diseases such as glaucoma. Recent advances in machine learning (ML) have made possible the automatic identification and analysis of RGCs within the complex three-dimensional retinal volumes obtained with such imaging. However, the current state-of-the-art ML approach relies on fully supervised training, which demands large amounts of training labels. Each volume requires many hours of expert manual annotation. Here, two semi-supervised training schemes are introduced, (i) cross-consistency training and (ii) cross pseudo supervision that utilize unlabeled AO-OCT volumes together with a minimal set of labels, vastly reducing the labeling demands. Moreover, these methods outperformed their fully supervised counterpart and achieved accuracy comparable to that of human experts. 
    more » « less
  3. PurposeThis study aims to explore how network visualization provides opportunities for learners to explore data literacy concepts using locally and personally relevant data. Design/methodology/approachThe researchers designed six locally relevant network visualization activities to support students’ data reasoning practices toward understanding aggregate patterns in data. Cultural historical activity theory (Engeström, 1999) guides the analysis to identify how network visualization activities mediate students’ emerging understanding of aggregate data sets. FindingsPre/posttest findings indicate that this implementation positively impacted students’ understanding of network visualization concepts, as they were able to identify and interpret key relationships from novel networks. Interaction analysis (Jordan and Henderson, 1995) of video data revealed nuances of how activities mediated students’ improved ability to interpret network data. Some challenges noted in other studies, such as students’ tendency to focus on familiar concepts, are also noted as teachers supported conversations to help students move beyond them. Originality/valueTo the best of the authors’ knowledge, this is the first study the authors are aware of that supported elementary students in exploring data literacy through network visualization. The authors discuss how network visualizations and locally/personally meaningful data provide opportunities for learning data literacy concepts across the curriculum. 
    more » « less
  4. Abstract While there is increased interest in using movement and embodiment to support learning due to the rise in theories of embodied cognition and learning, additional work needs to be done to explore how we can make sense of students collectively developing their understanding within a mixed-reality environment. In this paper, we explore embodied communication’s individual and collective functions as a way of seeing students’ learning through embodiment. We analyze data from a mixed-reality (MR) environment: Science through Technology Enhanced Play (STEP) (Danish et al., International Journal of Computer-Supported Collaborative Learning 15:49–87, 2020), using descriptive statistics and interaction analysis to explore the role of gesture and movement in student classroom activities and their pre-and post-interviews. The results reveal that students appear to develop gestures for representing challenging concepts within the classroom and then use these gestures to help clarify their understanding within the interview context. We further explore how students collectively develop these gestures in the classroom, with a focus on their communicative acts, then provide a list of individual and collective functions that are supported by student gestures and embodiment within the STEP MR environment, and discuss the functions of each act. Finally, we illustrate the value of attending to these gestures for educators and designers interested in supporting embodied learning. 
    more » « less
  5. Adaptive optics imaging has enabled the enhanced in vivo retinal visualization of individual cone and rod photoreceptors. Effective analysis of such high-resolution, feature rich images requires automated, robust algorithms. This paper describes RC-UPerNet, a novel deep learning algorithm, for identifying both types of photoreceptors, and was evaluated on images from central and peripheral retina extending out to 30° from the fovea in the nasal and temporal directions. Precision, recall and Dice scores were 0.928, 0.917 and 0.922 respectively for cones, and 0.876, 0.867 and 0.870 for rods. Scores agree well with human graders and are better than previously reported AI-based approaches. 
    more » « less